Creating new pipeline using seurat v4.0.2 available 2021.06.08
Load libraries required for Seuratv4
knitr::opts_knit$set(root.dir = "~/Desktop/10XGenomicsData/msAggr_scRNASeq/")
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(Seurat)
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Attaching SeuratObject
library(patchwork)
library(ggplot2)
library(clustree)
Loading required package: ggraph
store session info
sink("msAggr_sct-v1.20210615")
sessionInfo()
sink()
ScaleDatahttps://bioconductor.org/packages/3.10/workflows/vignettes/simpleSingleCell/inst/doc/batch.html#62_for_gene-based_analyses >You can also normalize and scale data for the RNA assay. There are numerous resources on this, but Aaron Lun describes why the original log-normalized values should be used for DE and visualizations of expression quite well here: > >For gene-based procedures like differential expression (DE) analyses or gene network construction, it is desirable to use the original log-expression values or counts. The corrected values are only used to obtain cell-level results such as clusters or trajectories. Batch effects are handled explicitly using blocking terms or via a meta-analysis across batches. We do not use the corrected values directly in gene-based analyses, for various reasons: > >It is usually inappropriate to perform DE analyses on batch-corrected values, due to the failure to model the uncertainty of the correction. This usually results in loss of type I error control, i.e., more false positives than expected. > >The correction does not preserve the mean-variance relationship. Applications of common DE methods like edgeR or limma are unlikely to be valid. > >Batch correction may (correctly) remove biological differences between batches in the course of mapping all cells onto a common coordinate system. Returning to the uncorrected expression values provides an opportunity for detecting such differences if they are of interest. Conversely, if the batch correction made a mistake, the use of the uncorrected expression values provides an important sanity check. > >In addition, the normalized values in SCT and integrated assays don’t necessary correspond to per-gene expression values anyway, rather containing residuals (in the case of the scale.data slot for each).
Load 4 populations into single seurat object
SET SEED?????!!!!!
projectName <- "msAggr_sct"
jackstraw.dim <- 40
source("msAggr_AnalysisCode/read_10XGenomics_data.R")
setwd("../cellRanger/") # temporarily changing wd only works if you run the entire chunk at once
Warning: The working directory was changed to /Users/heustonef/Desktop/10XGenomicsData/cellRanger inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
data_file.list <- read_10XGenomics_data(sample.list = c("LSKm2", "CMPm2", "MEPm", "GMPm"))
seurat.object.data<-Read10X(data_file.list)
seurat.object<- CreateSeuratObject(counts = seurat.object.data, min.cells = 3, min.genes = 200, project = projectName)
Clean up to free memory
remove(seurat.object.data)
Add mitochondrial metadata and plot some basic features
seurat.object[["percent.mt"]] <- PercentageFeatureSet(seurat.object, pattern = "^mt-")
VlnPlot(seurat.object, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0, fill.by = 'orig.ident', )
plot1 <- FeatureScatter(seurat.object, feature1 = "nCount_RNA", feature2 = "percent.mt", group.by = "orig.ident", pt.size = 0.01)
plot2 <- FeatureScatter(seurat.object, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", group.by = "orig.ident", pt.size = 0.01)
plot1 + plot2
remove low quality cells ** require: nFeature_RNA between 200 and 4000 (inclusive) –require: percent.mt <5–???
** In previous run it’s possible that we’ll need to limit percent.mt because of MEPm weirdness. Will have to see if the additional 2000 MEPm cells found by cellranger v6
print(paste("original object:", nrow(seurat.object@meta.data), "cells", sep = " "))
[1] "original object: 40815 cells"
seurat.object <- subset(seurat.object,
subset = nFeature_RNA >=200 &
nFeature_RNA <= 4000
)
print(paste("new object:", nrow(seurat.object@meta.data), "cells", sep = " "))
[1] "new object: 40815 cells"
seurat.object <- PercentageFeatureSet(seurat.object, pattern = '^mt-', col.name = "percent.mt")
As previously mentioned, SCTransform() is best for batch correction, whereas NormalizeData(), ScaleData(), and FindVariableFeatures() are best for DGE. I’m struggling to make stable clusters, so I’m jumping ahead to SCTransform().
It does appear that SCTransform() data are stored in the SCT assay slot, which does not overwrite other normalization slots. It will become the default slot once populated.
seurat.object <- SCTransform(seurat.object,
method = "glmGamPoi",
vars.to.regress = c("nCount_RNA", "nFeature_RNA")
) # method = "glmGamPoi" speeds process
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 17709 by 40815
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
|
| | 0%
|
|============================= | 25%
|
|========================================================== | 50%
|
|====================================================================================== | 75%
|
|===================================================================================================================| 100%
Found 108 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 17709 genes
|
| | 0%
|
|=== | 3%
|
|====== | 6%
|
|========== | 8%
|
|============= | 11%
|
|================ | 14%
|
|=================== | 17%
|
|====================== | 19%
|
|========================== | 22%
|
|============================= | 25%
|
|================================ | 28%
|
|=================================== | 31%
|
|====================================== | 33%
|
|========================================== | 36%
|
|============================================= | 39%
|
|================================================ | 42%
|
|=================================================== | 44%
|
|====================================================== | 47%
|
|========================================================== | 50%
|
|============================================================= | 53%
|
|================================================================ | 56%
|
|=================================================================== | 58%
|
|====================================================================== | 61%
|
|========================================================================= | 64%
|
|============================================================================= | 67%
|
|================================================================================ | 69%
|
|=================================================================================== | 72%
|
|====================================================================================== | 75%
|
|========================================================================================= | 78%
|
|============================================================================================= | 81%
|
|================================================================================================ | 83%
|
|=================================================================================================== | 86%
|
|====================================================================================================== | 89%
|
|========================================================================================================= | 92%
|
|============================================================================================================= | 94%
|
|================================================================================================================ | 97%
|
|===================================================================================================================| 100%
Computing corrected count matrix for 17709 genes
|
| | 0%
|
|=== | 3%
|
|====== | 6%
|
|========== | 8%
|
|============= | 11%
|
|================ | 14%
|
|=================== | 17%
|
|====================== | 19%
|
|========================== | 22%
|
|============================= | 25%
|
|================================ | 28%
|
|=================================== | 31%
|
|====================================== | 33%
|
|========================================== | 36%
|
|============================================= | 39%
|
|================================================ | 42%
|
|=================================================== | 44%
|
|====================================================== | 47%
|
|========================================================== | 50%
|
|============================================================= | 53%
|
|================================================================ | 56%
|
|=================================================================== | 58%
|
|====================================================================== | 61%
|
|========================================================================= | 64%
|
|============================================================================= | 67%
|
|================================================================================ | 69%
|
|=================================================================================== | 72%
|
|====================================================================================== | 75%
|
|========================================================================================= | 78%
|
|============================================================================================= | 81%
|
|================================================================================================ | 83%
|
|=================================================================================================== | 86%
|
|====================================================================================================== | 89%
|
|========================================================================================================= | 92%
|
|============================================================================================================= | 94%
|
|================================================================================================================ | 97%
|
|===================================================================================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 2.660115 mins
Determine variable features
Place corrected count matrix in counts slot
Regressing out nCount_RNA, nFeature_RNA
|
| | 0%
|
|= | 0%
|
|= | 1%
|
|== | 1%
|
|== | 2%
|
|=== | 2%
|
|=== | 3%
|
|==== | 3%
|
|==== | 4%
|
|===== | 4%
|
|===== | 5%
|
|====== | 5%
|
|====== | 6%
|
|======= | 6%
|
|======== | 7%
|
|========= | 7%
|
|========= | 8%
|
|========== | 8%
|
|========== | 9%
|
|=========== | 9%
|
|=========== | 10%
|
|============ | 10%
|
|============ | 11%
|
|============= | 11%
|
|============= | 12%
|
|============== | 12%
|
|============== | 13%
|
|=============== | 13%
|
|================ | 14%
|
|================= | 14%
|
|================= | 15%
|
|================== | 15%
|
|================== | 16%
|
|=================== | 16%
|
|=================== | 17%
|
|==================== | 17%
|
|==================== | 18%
|
|===================== | 18%
|
|===================== | 19%
|
|====================== | 19%
|
|====================== | 20%
|
|======================= | 20%
|
|======================== | 20%
|
|======================== | 21%
|
|========================= | 21%
|
|========================= | 22%
|
|========================== | 22%
|
|========================== | 23%
|
|=========================== | 23%
|
|=========================== | 24%
|
|============================ | 24%
|
|============================ | 25%
|
|============================= | 25%
|
|============================= | 26%
|
|============================== | 26%
|
|=============================== | 27%
|
|================================ | 27%
|
|================================ | 28%
|
|================================= | 28%
|
|================================= | 29%
|
|================================== | 29%
|
|================================== | 30%
|
|=================================== | 30%
|
|=================================== | 31%
|
|==================================== | 31%
|
|==================================== | 32%
|
|===================================== | 32%
|
|===================================== | 33%
|
|====================================== | 33%
|
|======================================= | 34%
|
|======================================== | 34%
|
|======================================== | 35%
|
|========================================= | 35%
|
|========================================= | 36%
|
|========================================== | 36%
|
|========================================== | 37%
|
|=========================================== | 37%
|
|=========================================== | 38%
|
|============================================ | 38%
|
|============================================ | 39%
|
|============================================= | 39%
|
|============================================= | 40%
|
|============================================== | 40%
|
|=============================================== | 40%
|
|=============================================== | 41%
|
|================================================ | 41%
|
|================================================ | 42%
|
|================================================= | 42%
|
|================================================= | 43%
|
|================================================== | 43%
|
|================================================== | 44%
|
|=================================================== | 44%
|
|=================================================== | 45%
|
|==================================================== | 45%
|
|==================================================== | 46%
|
|===================================================== | 46%
|
|====================================================== | 47%
|
|======================================================= | 47%
|
|======================================================= | 48%
|
|======================================================== | 48%
|
|======================================================== | 49%
|
|========================================================= | 49%
|
|========================================================= | 50%
|
|========================================================== | 50%
|
|========================================================== | 51%
|
|=========================================================== | 51%
|
|=========================================================== | 52%
|
|============================================================ | 52%
|
|============================================================ | 53%
|
|============================================================= | 53%
|
|============================================================== | 54%
|
|=============================================================== | 54%
|
|=============================================================== | 55%
|
|================================================================ | 55%
|
|================================================================ | 56%
|
|================================================================= | 56%
|
|================================================================= | 57%
|
|================================================================== | 57%
|
|================================================================== | 58%
|
|=================================================================== | 58%
|
|=================================================================== | 59%
|
|==================================================================== | 59%
|
|==================================================================== | 60%
|
|===================================================================== | 60%
|
|====================================================================== | 60%
|
|====================================================================== | 61%
|
|======================================================================= | 61%
|
|======================================================================= | 62%
|
|======================================================================== | 62%
|
|======================================================================== | 63%
|
|========================================================================= | 63%
|
|========================================================================= | 64%
|
|========================================================================== | 64%
|
|========================================================================== | 65%
|
|=========================================================================== | 65%
|
|=========================================================================== | 66%
|
|============================================================================ | 66%
|
|============================================================================= | 67%
|
|============================================================================== | 67%
|
|============================================================================== | 68%
|
|=============================================================================== | 68%
|
|=============================================================================== | 69%
|
|================================================================================ | 69%
|
|================================================================================ | 70%
|
|================================================================================= | 70%
|
|================================================================================= | 71%
|
|================================================================================== | 71%
|
|================================================================================== | 72%
|
|=================================================================================== | 72%
|
|=================================================================================== | 73%
|
|==================================================================================== | 73%
|
|===================================================================================== | 74%
|
|====================================================================================== | 74%
|
|====================================================================================== | 75%
|
|======================================================================================= | 75%
|
|======================================================================================= | 76%
|
|======================================================================================== | 76%
|
|======================================================================================== | 77%
|
|========================================================================================= | 77%
|
|========================================================================================= | 78%
|
|========================================================================================== | 78%
|
|========================================================================================== | 79%
|
|=========================================================================================== | 79%
|
|=========================================================================================== | 80%
|
|============================================================================================ | 80%
|
|============================================================================================= | 80%
|
|============================================================================================= | 81%
|
|============================================================================================== | 81%
|
|============================================================================================== | 82%
|
|=============================================================================================== | 82%
|
|=============================================================================================== | 83%
|
|================================================================================================ | 83%
|
|================================================================================================ | 84%
|
|================================================================================================= | 84%
|
|================================================================================================= | 85%
|
|================================================================================================== | 85%
|
|================================================================================================== | 86%
|
|=================================================================================================== | 86%
|
|==================================================================================================== | 87%
|
|===================================================================================================== | 87%
|
|===================================================================================================== | 88%
|
|====================================================================================================== | 88%
|
|====================================================================================================== | 89%
|
|======================================================================================================= | 89%
|
|======================================================================================================= | 90%
|
|======================================================================================================== | 90%
|
|======================================================================================================== | 91%
|
|========================================================================================================= | 91%
|
|========================================================================================================= | 92%
|
|========================================================================================================== | 92%
|
|========================================================================================================== | 93%
|
|=========================================================================================================== | 93%
|
|============================================================================================================ | 94%
|
|============================================================================================================= | 94%
|
|============================================================================================================= | 95%
|
|============================================================================================================== | 95%
|
|============================================================================================================== | 96%
|
|=============================================================================================================== | 96%
|
|=============================================================================================================== | 97%
|
|================================================================================================================ | 97%
|
|================================================================================================================ | 98%
|
|================================================================================================================= | 98%
|
|================================================================================================================= | 99%
|
|================================================================================================================== | 99%
|
|================================================================================================================== | 100%
|
|===================================================================================================================| 100%
Centering data matrix
|
| | 0%
|
|============================= | 25%
|
|========================================================== | 50%
|
|====================================================================================== | 75%
|
|===================================================================================================================| 100%
Set default assay to SCT
Tada! Now lets do some plotting and clustering!
linear dimensional reduction. Default are based on VariableFeatures, but can be changed
seurat.object <- RunPCA(seurat.object, features = VariableFeatures(object = seurat.object), verbose = FALSE)
VizDimLoadings(seurat.object, dims = 1:6, nfeatures = 10, reduction = "pca", ncol = 2)
DimPlot colored by orig.ident
DimPlot(seurat.object, reduction = "pca", group.by = "orig.ident")
source("msAggr_AnalysisCode/PercentVariance.R")
ElbowPlot(seurat.object, ndims = 50)
percent.variance(seurat.object@reductions$pca@stdev)
Number of PCs describing X% of variance
ElbowPlot(seurat.object, ndims = 50)
percent.variance(seurat.object@reductions$pca@stdev)
Cover 95% variability
tot.var <- percent.variance(seurat.object@reductions$pca@stdev, plot.var = FALSE, return.val = TRUE)
paste0("Num pcs for 80% variance:", length(which(cumsum(tot.var) <= 80)))
[1] "Num pcs for 80% variance:11"
paste0("Num pcs for 85% variance:", length(which(cumsum(tot.var) <= 85)))
[1] "Num pcs for 85% variance:15"
paste0("Num pcs for 90% variance:", length(which(cumsum(tot.var) <= 90)))
[1] "Num pcs for 90% variance:22"
paste0("Num pcs for 95% variance:", length(which(cumsum(tot.var) <= 95)))
[1] "Num pcs for 95% variance:33"
ndim <- 33
Plotting…
seurat.object <- RunUMAP(seurat.object, dims = 1:33)
16:50:51 UMAP embedding parameters a = 0.9922 b = 1.112
16:50:51 Read 40815 rows and found 33 numeric columns
16:50:51 Using Annoy for neighbor search, n_neighbors = 30
16:50:51 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:50:56 Writing NN index file to temp file /var/folders/4f/fwrj6fnn1dn4g8wsf0zv563hjsvl24/T//RtmpLSZxR7/file16fc1e0cf629
16:50:56 Searching Annoy index using 1 thread, search_k = 3000
16:51:07 Annoy recall = 100%
16:51:08 Commencing smooth kNN distance calibration using 1 thread
16:51:10 Initializing from normalized Laplacian + noise
16:51:12 Commencing optimization for 200 epochs, with 1697286 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:51:32 Optimization finished
seurat.object <- FindNeighbors(seurat.object, dims = 1:33)
Computing nearest neighbor graph
Computing SNN
for(x in c(0.5, 1, 1.5, 2, 2.5)){
seurat.object <- FindClusters(seurat.object, resolution = x)
}
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 40815
Number of edges: 1349968
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9113
Number of communities: 17
Elapsed time: 8 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 40815
Number of edges: 1349968
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8778
Number of communities: 27
Elapsed time: 8 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 40815
Number of edges: 1349968
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8566
Number of communities: 36
Elapsed time: 8 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 40815
Number of edges: 1349968
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8392
Number of communities: 46
Elapsed time: 8 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 40815
Number of edges: 1349968
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8250
Number of communities: 48
Elapsed time: 7 seconds
for (meta.col in colnames(seurat.object@meta.data)){
if(grepl(pattern = ("SCT_snn_res"), x = meta.col)==TRUE){
myplot <- DimPlot(seurat.object,
group.by = meta.col,
reduction = "umap",
cols = colorRamps::primary.colors(n = length(levels(seurat.object@meta.data[[meta.col]])))
) +
ggtitle(paste0("msAggr sct dim31 res", gsub("SCT_snn_res", "", meta.col) ))
plot(myplot)
}
}
Must ensure we have the right cluster stability, that is, cells that start in the same cluster tend to stay in the same cluster. If your data is over-clustered, cells will bounce between groups.
Following [this tutorial by Matt O.].https://towardsdatascience.com/10-tips-for-choosing-the-optimal-number-of-clusters-277e93d72d92.
Previously my favourite has been Clustree, which gives a nice visual NB: For some reason clustree::clustree() didn’t work, whereas library(clustree) followed by clustree() did.
saveRDS(seurat.object, file = "msAggr_sctdim33.rds")
Basically no stability difference compared to before.
Cover 90% variability
clustree(seurat.object, prefix = "SCT_snn_res.", node_colour = "sc3_stability") +
scale_color_continuous(low = 'red3', high = 'white')
ndim <- 22
Plotting…
seurat.object <- RunUMAP(seurat.object, dims = 1:ndim)
09:11:45 UMAP embedding parameters a = 0.9922 b = 1.112
09:11:45 Read 40815 rows and found 22 numeric columns
09:11:45 Using Annoy for neighbor search, n_neighbors = 30
09:11:45 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
09:11:49 Writing NN index file to temp file /var/folders/4f/fwrj6fnn1dn4g8wsf0zv563hjsvl24/T//RtmpLSZxR7/file16fc360e68c6
09:11:49 Searching Annoy index using 1 thread, search_k = 3000
09:11:59 Annoy recall = 100%
09:12:00 Commencing smooth kNN distance calibration using 1 thread
09:12:02 Initializing from normalized Laplacian + noise
09:12:05 Commencing optimization for 200 epochs, with 1662108 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
*****************************
for (meta.col in colnames(seurat.object@meta.data)){
if(grepl(pattern = ("SCT_snn_res"), x = meta.col)==TRUE){
myplot <- DimPlot(seurat.object,
group.by = meta.col,
reduction = "umap",
cols = colorRamps::primary.colors(n = length(levels(seurat.object@meta.data[[meta.col]])))
) +
ggtitle(paste0("msAggr sct dim", ndim, "res", gsub("SCT_snn_res", "", meta.col) ))
plot(myplot)
}
}
Must ensure we have the right cluster stability, that is, cells that start in the same cluster tend to stay in the same cluster. If your data is over-clustered, cells will bounce between groups.
Following [this tutorial by Matt O.].https://towardsdatascience.com/10-tips-for-choosing-the-optimal-number-of-clusters-277e93d72d92.
Previously my favourite has been Clustree, which gives a nice visual NB: For some reason clustree::clustree() didn’t work, whereas library(clustree) followed by clustree() did.
saveRDS(seurat.object, file = paste0("msAggr_sctdim", ndim, ".rds"))
Hard to tell if dim33 or dim22 is better, but if feels like dim22 has less cross-over between res0.5 and res1.0. Will continue at dim22, and see how things change between res0.5 and res1.0
clustree(seurat.object, prefix = "SCT_snn_res.", node_colour = "sc3_stability") +
scale_color_continuous(low = 'red3', high = 'white')
for (x in c("LSKm2", "CMPm2", "MEPm", "GMPm")){
h = hist(seurat.object@meta.data[seurat.object@meta.data$orig.ident == x, 'percent.mt'], breaks = 30, plot = FALSE)
h$density = h$counts/sum(h$counts)*100
plot(h,freq=FALSE, main = paste(x, "percent mitoC"), xlab = "percent mitoC", ylab = "Frequency")
}